This notebook documents preliminary analysis of tracking data for fish tagged in Molokini Crater between 2020-05-16 and 2021-05-24.
The purpose of this study is to understand how human impacts affect the fish of Molokini Crater
We are particularly interested in answering the following hypotheses: 1. Is the presence of fish affected by vessel presence
Proposed Approach: 1. Begin by calculating the number of each species tagged and basic summary statistics 2. Calculate Metrics - Receiver Use - Pianka’s Niche Overlap - residency 3. Make the following plots - Map - Receiver locations - Map - Average receiver use by Species - Scatterplot - day night plots - Bar Plot - The number of detections per day (individual) - Bar Plot - The number of individuals detected (species) - Line Chart - The proportion of individuals detected n days after tagging (30 day moving average by species) - Bar Plot - Daily vessel traffic - Scatter Plot - vessel traffic vs. proportion of fish detected in crater daily (scatterplot by species) 4. Perform the following statistical Tests - Compare Residency Rates by Species - Compare residency by species, size, and time at liberty - Create a GLM comparing # of individuals in crater regressed against boat traffic and species using AR(1) term on dependent variable on some time scale (daily? 6 hours? depends on resolution of vessel data)
project_directory = '/Users/stephenscherrer/Documents/Programming/Projects/Molokini'
scripts_directory = file.path(project_directory, 'Analysis Scripts')
data_directory = file.path(project_directory, 'Data')
results_directory = file.path(project_directory, 'Results')
figure_directory = file.path(results_directory, 'Figures')
## Files from VUE
molo_df = load_vemco_data(file.path(data_directory, 'VUE_Export.csv'))
false_detections_df = load_fdf_report(file.path(data_directory, 'FDA.csv'))
## Vessel Traffic data
vessel_df = load_vessel_data(file.path(data_directory, "Molokini_Master_June_21.csv"))
## Metadata Files
tagging_df = load_tagging_data(file.path(data_directory, 'Molokini_Fish_Tagging_master.xlsx'))
# receiver_df = load_receiver_data(file.path(data_directory, ))
## Associate detections with time of day
molo_df = get_time_of_day(molo_df)
## Combine vue df with tagging df - remove irrelevant tags in the process
molo_df = inner_join(x = molo_df, y = tagging_df[ ,c('tag_id', 'species', 'fork_length', 'tagging_date' )], by = 'tag_id')
## Filter false detections
# molo_df = filter_false_detections(molo_df)
## Reclass vessel_df date because R keeps implicitly recasting values...
vessel_df$date = as.POSIXct(as.numeric(vessel_df$date), origin = '1970-01-01', tz = 'HST')
## remove vessel data before start of study
vessel_df = vessel_df[vessel_df$date >= min(molo_df$datetime), ]
## Get count of individuals tagged by species
tags_by_species = aggregate(tag_id ~ species, data = tagging_df, FUN = uniqueN)
colnames(tags_by_species) = c('species', 'tagged')
## Merge with count of individuals detected by species
tags_by_species = left_join(tags_by_species, aggregate(tag_id ~ species, data = molo_df, FUN = uniqueN), by = 'species')
## Replace NA values with 0
tags_by_species[is.na(tags_by_species)] = 0
print(tags_by_species)
# Time at liberty
time_at_liberty = calculate_time_at_liberty(molo_df)
# Days Detected
days_detected = calculate_days_detected(molo_df)
# % of days detected
detection_stats = merge(x = days_detected, y = time_at_liberty[ ,c('tag_id', 'days_at_liberty')], on.x = 'tag_id', on.y = 'tag_id')
detection_stats$percent_days_detected = round(detection_stats$unique_days / detection_stats$days_at_liberty, 4) * 100
# Merge with tagging data to get fish info
detection_stats = merge(x = tagging_df[ ,c('tagging_date', 'species', 'tag_id', 'fork_length')], y = detection_stats, on.x = 'tag_id', on.y = 'tag_id')
detection_stats = detection_stats[order(detection_stats$species, detection_stats$tagging_date, detection_stats$tag_id), ]
print(detection_stats)
## sum all spp, sum all individuals (detections of tag at given reciever / all detections of tag)
## Calculate unique detections per tag per receiver station
detections_per_tag_per_receiver = aggregate(datetime~tag_id+receiver+species, data = molo_df, FUN = uniqueN)
colnames(detections_per_tag_per_receiver) = c('tag_id', 'receiver', 'species', 'detections')
## Calculate receiver use metric for each fish and receiver pair
detections_per_tag_per_receiver$receiver_use = 0
for (species in detections_per_tag_per_receiver$species){
for (i in 1:nrow(detections_per_tag_per_receiver)){
detections_per_tag_per_receiver$receiver_use[i] = detections_per_tag_per_receiver$detections[i] / sum(detections_per_tag_per_receiver$detections[detections_per_tag_per_receiver$tag_id == detections_per_tag_per_receiver$tag_id[i]])
}
}
## Calculate average receiver use metric for each tag - Omit stations with no use as this would bias metric
indvidual_receiver_use = aggregate(receiver_use~tag_id+species, data = detections_per_tag_per_receiver[detections_per_tag_per_receiver$receiver_use > 0, ], FUN = mean)
## Add this information to detection_stats
detection_stats = merge(detection_stats, indvidual_receiver_use, on = 'tag_id')
## Calculate receiver use metric by species
species_receiver_use = aggregate(receiver_use~species, data = indvidual_receiver_use, FUN = mean)
colnames(species_receiver_use) = c('species', 'receiver_use')
print(species_receiver_use)
0 = no overlap, 1 = perfect overlap
## Aggregate data averaged by species
receiver_use_aggregated_by_species = aggregate(receiver_use ~ species + receiver , data = detections_per_tag_per_receiver, FUN = mean)
colnames(receiver_use_aggregated_by_species) = c('species', 'receiver', 'avg_use_index')
## Reshape from Long to Wide format
receiver_use_aggregated_by_species_wide = dcast(receiver_use_aggregated_by_species, species ~ receiver)
Using avg_use_index as value column: use value.var to override.
## Get all species combinations
species_combos = data.frame()
for (i in 1:nrow(receiver_use_aggregated_by_species_wide)){
if(i != nrow(receiver_use_aggregated_by_species_wide)){
for (j in (i+1):nrow(receiver_use_aggregated_by_species_wide)){
species_combos = rbind(species_combos, data.frame('species_1' = receiver_use_aggregated_by_species_wide$species[i], 'species_2' = receiver_use_aggregated_by_species_wide$species[j]))
}
}
}
## Change any NA values to zero
receiver_use_aggregated_by_species_wide[is.na(receiver_use_aggregated_by_species_wide)] = 0
## Calculate Pianka's index for all pairs
species_combos$pianka_index = 0
for(i in 1:nrow(species_combos)){
species_combos$pianka_index[i] = sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_1[i], -1] *
receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_2[i], -1]) /
(sqrt(sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_1[i], -1] ^ 2) *
sum(receiver_use_aggregated_by_species_wide[receiver_use_aggregated_by_species_wide$species == species_combos$species_2[i], -1] ^ 2)))
}
## Round to 3 digits
species_combos$pianka_index = round(species_combos$pianka_index, 3)
print(species_combos)
Study Area
## Plot study area and receivers
molo_basemap = get_map(location = c(lon = -156.496331, lat = 20.633007), zoom = 16, maptype = 'satellite')
Source : https://maps.googleapis.com/maps/api/staticmap?center=20.633007,-156.496331&zoom=16&size=640x640&scale=2&maptype=satellite&language=en-EN&key=xxx-hggZe5I57UhGHb8
receiver_map = ggmap(molo_basemap) + geom_point(data = molo_df, mapping = aes(x = lon, y = lat), col = 'red') + labs(x = '°Longitude', y = '°Latitude', title = 'Receiver Locations') + ggsave(filename = 'Receiver Locations Google Map.pdf', path = figure_directory)
Saving 7 x 7 in image
print(receiver_map)
## Get average use of receiver by species
species_receiver_use = aggregate(receiver_use~species+receiver, data = detections_per_tag_per_receiver, FUN = mean)
colnames(species_receiver_use) = c('species', 'receiver' , 'receiver_use')
## Merge with lat lon positions for each receiver from molo_df
receiver_postions = unique(molo_df[ ,c('receiver', 'lat', 'lon')])
species_receiver_use = merge(x = species_receiver_use, y = receiver_postions, on = 'receiver', all.x = T, all.y = F)
## Make species plots for receiver use by species
for(species in unique(species_receiver_use$species)){
receiver_use_by_spp = ggmap(molo_basemap) +
geom_point(data = species_receiver_use[species_receiver_use$species == species, ],
mapping = aes(x = lon, y = lat, color = 'red', size = receiver_use)) +
labs(x = '°Longitude', y = '°Latitude', title = paste('Receiver Use by ', species, sep = '')) +
ggsave(filename = paste('Receiver Use by ', species, '.pdf', sep = ''), path = figure_directory)
print(receiver_use_by_spp)
}
Saving 7 x 7 in image
### Day Night Plots
## For all fish
pdf(file = file.path(figure_directory, 'Day Night Plot - All Fish.pdf'))
plot_day_night(molo_df, plot_title = 'All Fish')
dev.off()
null device
1
## By Species
for (spp in unique(molo_df$species)){
pdf(file = file.path(figure_directory, paste('Day Night Plot - Species ', spp, '.pdf', sep = '')))
plot_day_night(molo_df[molo_df$tag_id == molo_df$tag_id[molo_df$species == spp], ], plot_title = spp)
dev.off()
}
longer object length is not a multiple of shorter object length
## By Individual
for (tag_id in unique(molo_df$tag_id)){
pdf(file = file.path(figure_directory, paste('Day Night Plot - Tag ID ', tag_id, '.pdf', sep = '')))
plot_day_night(molo_df[molo_df$tag_id == tag_id, ], plot_title = paste(tagging_df$species[tagging_df$tag_id == tag_id], '- Tag', as.character(tag_id), sep = ' '))
dev.off()
}
### Bar plot of detections in crater by date
detections_per_day_df = count_detections_per_date(molo_df)
## Barplot of detections by individual
for(i in 1:nrow(detections_per_day_df)){
## Convert from wide to long format
indv_data = melt(detections_per_day_df[i, ])
colnames(indv_data) = c('date', 'detections')
## Make and save plot
ggplot(data = indv_data, mapping = aes(x = date, y = detections)) +
geom_bar(stat = "identity") +
labs(title = paste('Tag ', rownames(detections_per_day_df)[i], sep = ' '), x = 'Date', y = 'Detections') +
ggsave(filename = paste('Daily Detection Barplot -', rownames(detections_per_day_df)[i], '.pdf'), path = figure_directory)
}
No id variables; using all as measure variables
Saving 7 x 7 in image
## Detections by species
detections_per_day_spp_stg = detections_per_day_df
detections_per_day_spp_stg$tag_id = rownames(detections_per_day_spp_stg)
detections_per_day_spp_stg = left_join(x = detections_per_day_spp_stg, tagging_df[ ,c('tag_id', 'species')], by = 'tag_id')
## Loop through species
for (spp in unique(detections_per_day_spp_stg$species)){
## Subset individual df by species
spp_subset_df = detections_per_day_spp_stg[detections_per_day_spp_stg$species == spp, -which(colnames(detections_per_day_spp_stg) %in% c('tag_id', 'species'))]
## Convert to long format
detections_per_spp = melt(colSums(spp_subset_df), value.name = 'detections')
detections_per_spp$date = rownames(detections_per_spp)
## Make and save plot
ggplot(data = detections_per_spp, mapping = aes(x = date, y = detections)) +
geom_bar(stat = "identity") +
labs(title = spp, x = 'Date', y = 'Detections') +
ggsave(filename = paste('Daily Detection Barplot -', spp, '.pdf'), path = figure_directory)
}
## Barplot of all detections
all_detections = colSums(detections_per_day_df)
## Convert to long format
all_detections_long = melt(all_detections, value.name = 'detections')
all_detections_long$date = rownames(all_detections_long)
## Make and save plot
ggplot(data = all_detections_long, mapping = aes(x = date, y = detections)) +
geom_bar(stat = "identity") +
labs(title = 'All Tagged Individuals', x = 'Date', y = 'Detections') +
ggsave(filename = paste('Daily Detection Barplot - all tags.pdf'), path = figure_directory)
THIS NEEDS WORK!!!
In the future, might also consider max vessels present at a given time
## Calculate Daily Vessel Stats
vessels_per_day = aggregate(vessel_name ~ date, data = vessel_df, FUN = uniqueN)
colnames(vessels_per_day) = c('date', 'daily_vessels')
# Make plot for total
total_vessels_plot = ggplot(data = vessels_per_day, mapping = aes(x = date, y = daily_vessels)) +
geom_bar(stat = 'identity') +
labs(title = 'Maximum Number of Co-occuring Vessels Daily', x = 'Date', y = '# of Vessels') +
ggsave(filename = paste('Total Vessels Daily.pdf ', species, '.pdf', sep = ''), path = figure_directory)
Saving 7 x 7 in image
print(total_vessels_plot)
## combine vessel and detections per individual
detections_per_day_per_tag = aggregate(datetime ~ date + tag_id + species, data = molo_df, FUN = uniqueN)
colnames(detections_per_day_per_tag) = c('date', 'tag_id', 'species', 'daily_detections')
detection_vessel_counts = left_join(detections_per_day_per_tag, vessels_per_day, by = 'date')
ggplot(data = detection_vessel_counts, mapping = aes(x = daily_vessels, y = daily_detections, color = species)) + geom_point() + ggsave(filename = file.path(figure_directory, 'detections vs vessels.pdf'))
Saving 7 x 7 in image
Conclusion: There does appear to be a negative trend where increased vessel traffic results in fewer fish. See coefficients. Need to dig deeper if trying to publish these results though, model should be better
## Calculate residency
detection_stats$residence_metric = detection_stats$unique_days / detection_stats$days_at_liberty
## Assign residence category: low = < 33%, medium = 33 - 66, high = >= 66 (Tinhan et al. 2014) -
detection_stats$residence_category = 'Low'
for (i in 1:nrow(detection_stats)){
if (detection_stats$residence_metric[i] >= (1/3)) {
detection_stats$residence_category[i] = 'Medium'
}
if (detection_stats$residence_metric[i] >= (2/3)) {
detection_stats$residence_category[i] = 'High'
}
}
## Create grouped barplot of residency by species
residence_counts_by_species = aggregate(tag_id ~ species + residence_category, data = detection_stats, FUN = length)
ggplot(data = residence_counts_by_species, mapping = aes(x=species, y=tag_id, fill=residence_category)) +
geom_bar(stat="identity", position = "dodge")
Takeaways - All 4 omilus were highly resident as were grey reef sharks. No other species have replicates so…?
Calculate mean residency by spp (irregardless of time), then ANOVA by spp Use Tukey’s HSD to determine significance
## ANOVA model for residency metric by species
residence_by_species_anova = aov(residence_metric ~ species, data=detection_stats)
summary(residence_by_species_anova)
Df Sum Sq Mean Sq F value Pr(>F)
species 4 0.6781 0.16954 36.28 0.00212 **
Residuals 4 0.0187 0.00467
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## Tukey's Honestly Significant Differences between species
TukeyHSD(residence_by_species_anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = residence_metric ~ species, data = detection_stats)
$species
diff lwr upr
omilu-grey reef shark 0.13134256 -0.1318229 0.3945080
sandbar shark-grey reef shark -0.78499266 -1.1571648 -0.4128205
ulua-grey reef shark -0.04021073 -0.4123829 0.3319614
whitetip reef shark-grey reef shark 0.01604564 -0.3561265 0.3882178
sandbar shark-omilu -0.91633521 -1.2560803 -0.5765901
ulua-omilu -0.17155329 -0.5112984 0.1681918
whitetip reef shark-omilu -0.11529692 -0.4550421 0.2244482
ulua-sandbar shark 0.74478192 0.3150345 1.1745293
whitetip reef shark-sandbar shark 0.80103830 0.3712909 1.2307857
whitetip reef shark-ulua 0.05625637 -0.3734910 0.4860037
p adj
omilu-grey reef shark 0.3282773
sandbar shark-grey reef shark 0.0034142
ulua-grey reef shark 0.9852738
whitetip reef shark-grey reef shark 0.9995567
sandbar shark-omilu 0.0013197
ulua-omilu 0.3205721
whitetip reef shark-omilu 0.6064486
ulua-sandbar shark 0.0071653
whitetip reef shark-sandbar shark 0.0054544
whitetip reef shark-ulua 0.9710009
GLM comparing residency time by spp independent var (time at liberty) dependent (residency index)
summary(species_glm)
Call:
glm(formula = residence_metric ~ species, family = binomial(logit),
data = detection_stats)
Deviance Residuals:
1 2 3 4 5 6 7
0.00007 0.00007 0.00000 -0.26156 0.32896 -0.00022 0.00007
8 9
0.00000 0.00000
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.8652 2.0751 0.899 0.369
speciesomilu 4.0251 9.7553 0.413 0.680
speciessandbar shark -4.2953 4.2135 -1.019 0.308
speciesulua -0.3098 3.3547 -0.092 0.926
specieswhitetip reef shark 0.1458 3.7297 0.039 0.969
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4.31486 on 8 degrees of freedom
Residual deviance: 0.17663 on 4 degrees of freedom
AIC: 11.401
Number of Fisher Scoring iterations: 8
No Significant differences found